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Abstract. We introduce the notion of unavoidable (complete) sets of word patterns, which 
is a refinement for that of words, and study certain numerical characteristics for unavoidable 
sets of patterns. In some cases we employ the graph of pattern overlaps introduced in this 
paper, which is a subgraph of the de Bruijn graph and which we prove to be Hamiltonian. 
In other cases we reduce a problem under consideration to known facts on unavoidable sets 
of words. We also give a relation between our problem and intensively studied universal 
cycles, and prove there exists a universal cycle for word patterns of any length over any 
alphabet. 
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1. Introduction 

When defining or characterizing sets of objects in discrete mathematics, "languages of 
prohibitions" are often used to define a class of objects by listing the prohibited subobjects, 
i.e. subobjects that are not allowed to be contained in the objects of the class. The notion 
of a subobject is defined in different ways depending on the objects under consideration: a 
subword (a block or segment) for fragmentarily restricted languages, a subgraph for families 
of graphs, a subshape for two-dimensional shapes (e.g. a submatrix for matrices) and so on. 

We collect all prohibited objects into a set that we call a set of prohibited objects, or 
simply a set of prohibitions. The idea of unavoidable (or complete^) set is as follows: if there 
exists a restriction on the size of an object, in other words, if large enough objects must 
contain prohibited subobjects, then the set of prohibitions is unavoidable. 



^The word "complete" appears in e.g. but the word unavoidable^^ is of common use in con- 

temporary literature (e.g. see ^1 Chapter 3], ^HDi so we decided to use the latest terminology in this 
paper. 
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In this paper, we are interested in unavoidable sets of word patterns, or just patterns 
(see Section 121 for definitions). These patterns are an extension of the permutation patterns 
studied extensively for the last twenty years (see ^Hj for a survey on the corresponding 
problems). Our unavoidable sets of patterns are refinements for those of words. Questions 
on unavoidability of sets of words appear, for instance, in algebra (sequences without repe- 
titions), coding theory (chain codes), number theory (arithmetic progressions in partitions 
of the set of natural numbers), dynamical systems (motions of an object in a space with 
certain restrictions). 

There is a number of numerical characteristics that are valuable for unavoidability criteria 
and the recognition algorithms based on them. Three such characteristics, namely Mu,(n), 
Lyjin) and Cwiji) (for definitions see Section |2I), are considered in [H]. We consider the 
similar characteristics Mp{n,m), Lp{n,m) and Cp{n,m) for the case of prohibited patterns 
(for definitions see Section|21), where m is the number of letters in the corresponding alphabet 
(we do not use this parameter for the functions M^(n), L^{n) and C^(n) to be consistent 
with [Hj). Moreover, in Subsubsection ITT^ we discuss how finding a lower bound for Cp(r;,, m) 
is related to the so-called universal cycles for combinatorial structures that have been studied 
intensively (e.g. see j3J ^] and references therein). To get the lower bound, we prove that 
the graph of pattern overlaps (see definition in Section E)) is Hamiltonian, and derive as a 
corollary that there exists a universal cycle for word patterns of any length over any alphabet 
(see Corollary EHH) . 

We remark that when considering patterns, the underlying alphabet must be ordered, as 
opposed to the objects considered in [0]. 

The paper is organized as follows. In Section |21 we review the main results on unavoidable 
sets of words in jni Ej . The motivation for a relatively detailed review of these papers is the 
fact that they are available only in Russian (as far as we know), which caused, in particular, 
the rediscovery of some of those results in Besides, the results obtained in [n[[7j are of 
great interest in general and very useful in this paper in particular. In Section El we define 
the notion of a pattern, an n-pattern word, and study unavoidable sets of patterns. 

2. Unavoidable sets of words 

Let A = {ai, . . . , an} be an alphabet of n letters. A word over the alphabet ^ is a finite 
sequence of letters of the alphabet. Any i consecutive letters of a word X generate a subword 
of length i. The set A* is the set of all words over the alphabet A, and A^ is the set of all 
words over A of length n. Let S* C ^* be a set of prohibited words or a set of prohibitions. 
A word that does not contain any words from S as its subwords is said to be free from S or 
S-free. The set of all S-free words is denoted by 5*. 

If there exists a natural number k such that the length of any word in 5* is less than k, 
then S is called an unavoidable set. This is straightforward to see that S is unavoidable if 
and only if S has finitely many of elements. Thus, for any unavoidable set S we can define 
the function 

Lw{S) = max£(X), 
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where i{X) is the length of a word X. 

The basic problem in considering of sets of prohibitions is whether or not a given set S 
of prohibitions is unavoidable. Other possible questions are: given an unavoidable S find or 
estimate Lu,{S); construct an S'-free word of length Ly^{S); find the number of elements in S. 
If S is avoidable then some possible questions are: find an infinite S'-free sequence; describe 
all such sequences; find the cardinality of the set of these sequences; find the cardinality of 
the set of finite S'-avoiding sequences of a given length. 

Let S" be a finite set of words over an alphabet A, and let n be the maximal length of a 
word in S. If a word X is a subword of a word Y then we say that F is a superword for 
X. Suppose now that a word X E S and i{X) < n. Remove X from S and adjoin to S 
all superwords for X of length n. If this procedure is performed for any such X, and all 
resulting repetitions are removed, we will get a set S of distinct words of length n. 

Proposition 2.1. ([6, Proposition 1]) S is unavoidable iff S' is unavoidable. 

Thus, sets of prohibitions S C are of special interest, and for the most part, our 
considerations in this paper are related to these sets. More precisely, we will consider the 
functions 

My^{n) = v[im.\S\ and Ltf,(ra) = max L^(S'), 

where the extremum is taken with respect to all unavoidable S C A^. These functions 
are examples of numerical characteristics that describe the bound between avoidable and 
unavoidable sets of prohibitions. To give an instance of such a bound, we consider the 
following example. 

Example 2.2. (jUl Examples 1,2]). Consider A = {0, 1} and the sets of prohibitions 

= {000, 001, 1011, 0101, nil}, 
^2 = {000, 001, 1010, 0101, nil}. 

Thus Si and 5*2 differ only in one underlined letter. One can see that Si is unavoidable, and 
Lw{Si) = 8. On the other hand, 5*2 is avoidable. Indeed, 

^01^^01^... and ^1110111... 

are S'2-free, and 

^oi^^m and om^oi^ 

are S'2-free. Hence, substituting 1— Oil and 1 ^— 0111 in any sequence over A, we get an 
S'2-free sequence. Hence, the cardinality of 5*2 is the continuum. 

In what follows, we will need the following graph. A de Bruijn graph is a directed graph 
Gn = Gn{V, E), where the set of vertices V is the set of all words in A^, and there is an arc 
from u e A^ to V E A"' if and only if 

u = aw and v = wb for some w G A^^'^ and a,b E A. 

Figure n shows the de Bruijn graphs for a 2-letter alphabet and n = 2, 3. 
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Figure 1. The de Bruijn graphs for the alphabet A = {0, 1} and n = 2,3. 

The de Bruijn graphs were first introduced (for the alphabet A = {0, 1}) by de Bruijn in 
1944 for finding the number of code cycles. However, these graphs proved to be a useful tool 
for various problems related to combinatorics on words (e.g. see [HI HE])- It is known that 
the graph G„ can be defined recursively as Gn = L(G„-i), where L indicates the operation 
of taking the line graph. 

A chord of a directed simple path P in G„ is an arc that does not belong to P but connects 
two of its vertices in a such way that there is a circuit generated by this arch and the part 
of the path between the ends of the arc. For instance, on Figure |2]the arc BA is a chord for 
the path P, whereas AI3 is not. 

Let C^(n) denote the greatest length (the number of vertices) of a simple path in G„ that 
does not have chords and does not go through any vertex that has a loop. The following 
theorem was proved by considering the de Bruijn graph. 

Theorem 2.3. ([6, Theorem 1]) L^{n) = C^{n) + n-l = + n-2. 

The following theorem was proved using the cyclic structure of the de Bruijn graph (the 
main result of [TT]) as well as the number of conjugacy classes of words with respect to a 
cyclic shift. 

, -> ~ ^ 
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Figure 2. The arc BA is a chord for the path P, but AB is not. 
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Theorem 2.4. ([6, Theorem 2]) 



M^{n) = -y^{n/d)\A\' 



d\n 

where ip{n) is the number of integers in {1,2, ... ,n — 1} relatively prime to n (Euler's (p- 
f unction). 

Since any set of prohibitions S with 151 < Mw{n) is avoidable, it is helpful to have a table 
for M^(n). For |^| = 2 and 2 < n < 10, see Tabled 



n 



Mjn) 



6 



6 



14 



7 



20 



36 



9 



60 



10 



108 



Table 1. The function Myj[n) for 2 < n < 10 and a 2-letter alphabet. 



In particular, any set of binary words of length 9 that has less than 60 words is avoidable. 
Also, it is obvious that M^(n) ~ |^|"/n, when n — > oo. The last observation allows us to 
prove the following statement. 

Proposition 2.5. ([7, Proposition 1]) There exist at least 21-^1"*^^"^"^ unavoidable sets S C 
A"'. Here ^ when n oo. 

3. Unavoidable sets of patterns 

The alphabets considered in this section must be totally ordered, and without loss of 
generality they coincide with [m] = {1,2,..., m} for an appropriate m. 

We refer to for a general survey of various pattern problems. However, in this paper 
we are concerned only with word patterns studied for the first time in |2] . More precisely, we 
consider the word patterns without internal dashes (see fSl)- For this paper, we can define 
a pattern to be a subword (of a word) that contains each of the letters 1, 2, . . . , A; at least 
once for some k, and no other letters. For instance, the word 2613235 contains an occurrence 
of the pattern 1323, but its subword 2613 is not a pattern. By analogy with Section |21 if a 
word does not contain a pattern p, it is free from p or p-free. However, the crucial difference 
between this section and Section El is that instead of considering words free from a pattern 
p, we consider the objects that we call the n-pattern words. An n-pattern word is a word in 
which each subword of length n is a pattern. Thus, constructing n-pattern words, we can 
restrict ourselves to alphabets having at most n-letters. Indeed, an occurrence of a letter 
m > n in a subword A of length n of an n-pattern word W contradicts the fact that A must 
be a pattern {A must contain each of the letters 1, 2, . . . , m). 

By analogy with Section|21 when dealing with sets of prohibited words, we can consider sets 
of prohibited patterns, or simply sets of prohibitions, when it is clear which prohibitions we 
mean. We can also define the notion of an unavoidable set here in the same way. However, 
in considering prohibited patterns and n-pattern words, we assume that all prohibitions 
are of length n. Hence, for patterns, we can define the functions Lp{n,m) and Mp{n,m) 
similarly to Lyj{n) and Myj{n) (recall that m is the number of letters in the alphabet). As in 
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Section |21 the basic problem is whether or not a given set Sp of prohibitions is unavoidable, 
and Lp{n,m) and Mp{n,m) are important numerical characteristics to study. 

3.1. The function Mp{n,m). Recall that the Mobius function is defined by 



where Mp{n,m) = min 15*^1, and the minimum is taken over all unavoidable sets Sp of pat- 
terns of length n over the alphabet [m] . 

One can compare this result with that of Theorem 12.41 

Remark 3.2. In Theorem \?>.1\ we can assume that n > m, since if < m we can only use 
the first n letters in [m] to construct ra-pattern words, which reduces to the case n = m. 

Remark 3.3. For n = m, we have min(i, m) = i in the formula of Theorem 13.11 

To prove Theorem 13.11 we introduce the graph of pattern overlaps Pn = PniV,E), which 
is a subgraph of the de Bruijn graph G„, where the set of vertices V contains all n-letter 
patterns over the underlying alphabet A, and the set of arcs E consists of all the arcs of Gn 
between vertices corresponding to the patterns. In Figure IHl we can see the graph of pattern 
overlaps in the case of a 3-letter alphabet and n = 3 (we omit parentheses around the triples 
on the graph to indicate that we are dealing with P3, not G3). 

Let Tp{n, m) denote the number of conjugacy classes of patterns of length n over the alpha- 
bet [m] with respect to a cyclic shift. For instance, there are 5 conjugacy classes on Figure El 



They are {111}, {112,121,211}, {221,212,122}, {321,213,132} and {312,123,231}. Thus, 
Tp(3,3) =5. 



Lemma 3.4. Mp{ 

Proof. To prove the lemma, we follow the proof of Theorem 12.41 in [H]. 

Suppose Sp is an unavoidable set of patterns of length n and X is an arbitrary n-pattern 
word of length n {X is a pattern) over [m] . We form the sequence 





X 



00 



XXX . . . , 
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Figure 3. The graph of pattern overlaps for A = {1, 2, 3} and n = 3. 

by repeating the word X periodically. Since Sp is unavoidable, X°° contains a prohibited 
pattern p ^ Sp. From the construction of the sequence, p is either X or a cyclic shift of X. 
Thus Sp contains a pattern from each conjugacy class of patterns of length n over [m] with 
respect to a cyclic shift. Thus, \Sp\ > Tp{n,m), and since Sp is an arbitrary set, we have 

Mp{n,m) > Tp{n,m). 

To prove that Tp{n, m) is an upper bound, we need to find an unavoidable set of cardinality 
Tp{n,m). We consider the graph P„ whose vertices correspond to the words over [m]. If 
V C V{Pn) and each circuit of P„ contains a vertex in V then we say that V cuts all 
circuits of P„. By deleting all such V with all incident arcs from P„, we get an acyclic graph 
on the vertex set V\V'. The set of the patterns in [m]" corresponding to the vertices in V 
is unavoidable. Indeed, if not, a sequence free from V determines a self- intersecting walk in 

— * 

P„ and thus generates a circuit on the vertex set V\V', which is impossible. 

Golomb JT] found a set of vertices Vc that cuts all circuits of the de Bruijn graph Gn 
with \Vc\ equal to the number of conjugacy classes of the words. Thus Vc cuts all circuits 
in Gn and has one vertex in each conjugacy class. Since P„ is a subgraph of (?„, P„ will 
have no circuit after removing the vertices in Vc- The set of vertices in Vc that belong to Pn 
corresponds to an unavoidable set, and thus 

Mp{n,m) < Tp{n,m). 

This proves the lemma. □ 
Lemma 3.5. 

min(i,m)-l / • / ■ \ i \ i 

T,(n,m) = ^ ^ (-1)^( ^'.^ J-5^Mrf)(min(z,m)-j>. 

i\n j=0 ^ ^ ' ^ d\i 

Proof. Recall that a word x & A*, where A is any (ordered or unordered) alphabet, is called 
primitive if it is not a power of another word. Thus x 7^ is primitive if x = y'^ only for 
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e = 1. For instance, the words 121, 1221, 12121 are primitive, whereas the word 121212 
is not. It is easy to show that each nonempty word is a power of a unique primitive word. 
Thus, X = for a unique primitive word r. The number e is called the exponent of x. It is 
also easy to see that all words, and hence all patterns, in the same conjugacy class have the 
same exponent. Moreover, if xi = rf and X2 = and |xi| = \x2\, then xi is conjugate to X2 
iff ri is conjugate to r2. We define the notion of a primitive pattern in the same way as for 
words. Clearly, all properties of primitive words hold for primitive patterns as well. 

So, in order to find Tp(n, m), we need to find the number of conjugacy classes of primitive 
patterns of length i over the alphabet [m], where i\n, and then take a sum of these numbers. 
However, for a given i, we cannot use directly the well known formula for the number of 
conjugacy classes of primitive words over min(i, m)-letter alphabet (a primitive word of 
length i can have at most i distinct letters, since we are dealing with patterns), given by 

- /i((i)(min(i, m))^. 

^ d\i 

Indeed, this formula counts, among others, primitive words which are not primitive patterns 
(when some letter j, 2 < j < min(i,m) — 1, occurs in a primitive pattern whereas j — 1 
does not). So, we need to use the standard inclusion-exclusion method (the sieve formula) 
to handle this situation. We define the property Aj to be "the letter j does not occur in 
a primitive word". Clearly we may restrict ourselves to the case j < min(i,m) — 1, since 
the absence of the largest letter, namely min(i,m), is not a bad property when considering 
patterns. Now we easily get the number of primitive patterns of length i, which is given by 

min{j,m)-l / • / • N i \ i 

J2 (-1).' (■"'■'<'• "'-Mi 5: M<i)(mm(,™)-,)i. 

This proves the lemma. □ 
Now the truth of Theorem 13.11 follows from Lemmas 13.41 and 13.51 

3.2. The function Lp{n,m). Let Cp{n,m) denote the greatest length (the number of ver- 
tices) of a simple path in P„ that does not have chords (see the definition in Section |2I) and 
does not pass through any vertex incident with a loop. Using exactly the same considerations 
as in the proof of Theorem 12.31 (see |6J), one can prove the following theorem. 

Theorem 3.6. Lp{n, m) = Cp{n, m) + n — 1. 

Moreover, in the case m = 2, the de Bruijn graph Gn almost coincides with the graph of 
pattern overlaps P„. Indeed, the only difference between these graphs is the vertex (22 ... 2) 
and all edges adjacent to that vertex (22 ... 2 is the only binary non-pattern). However, the 
lemma to Theorem 12.31 fsee jS|) provides that in the binary case Cw{n) = 2"~^ — 1, and since 
Cw{n) is the maximal length of a path that, in particular, does not pass through the loop 
(22 ... 2), we have that in this case Cw{n) = Cp{n, 2). Thus the following theorem is true: 

Theorem 3.7. Lp{n, 2) = 2""^ + n-2. 
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However, in the case m > 3, the only useful information we can extract from Theorem 12 .31 
is the following rough bound 

Lp{n, m) < m"~^ + n — 2. 

So, according to Theorem 13 .61 we need to find Cp{n, m) in order to get m). The purpose 
of the rest of the subsection is to find an upper and a lower bound for Cp(n, m) for m > 3. 

3.2.1. An upper hound for Cp{n,m). We only give a trivial upper bound. Clearly, in order 
to avoid chords, each conjugacy class (with respect to shift) which has i words can have no 
more than i — 1 words in the path. Thus, we use the formula for Tp{m, n) with a correction, 
namely the factor of i — 1, which indicates that each primitive word of length i is responsible 
for a conjugacy class of i elements, and we take i — 1 elements out of these i: 

Cp{n,m)<J2i^-^) E (-lyr^"^^'"^^ Mij]Mrf)(min(^,m)-j)i 

i\n j=0 ^ ' d\i 

3.2.2. A lower bound for Cp{n,m). We observe that the line graph L(P„_i) for the graph 
Pn-i determines a subgraph of the graph P„. We get that by using the general properties of 
the de Bruijn graph (since P„ is its subgraph), as well as the fact that if X1X2 ■ ■ ■ and 
X2X3 . . .Xn are vertices in Pn-i, then the arc between them generates the vertex X1X2 ■ ■ - Xn 
in the line graph, and X1X2 ■ ■ - Xn is a pattern and thus belongs to P„. Moreover, from the 
considerations in the proof of Theorem 12.31 (see [O]), it follows that a simple path in P„_i 
determines a simple path without chords in P„ after removing the loop 11 ... 1. 

So, in order to get a lower bound for Cp{n, m), we need to construct a simple path in P„_i 
of as great a length as possible (ideally a Hamiltonian path). In order to get a Hamiltonian 
path or a path that is "close" to a Hamiltonian one, we can try to use the methods and 
techniques similar to those used in constructions of universal cycles for various combinatorial 
structures such as words, permutations, partitions, and others (e.g. see |H IT^). 

We briefly discuss the general notion of a universal cycle (see |?]). 

Suppose we are given a family J-'n of combinatorial objects of "rank n" and let m := |JF„| 
denote their number. We assume that each F E J-'n is "generated" or specified by some 
sequence X1X2 ■ ■ ■ Xn, where Xi E A for some fixed alphabet A. We say that U = aocti . . . Om-i 
is a universal cycle (or a U-cycle) for JF„ if aj+iaj+2 . . . ai+n, < i < m, runs through each 
element of JF„ exactly once, where index addition is performed modulo n. 

In our case the combinatorial objects are patterns of length n, and as in many other cases 
(e.g. de Bruijn cycles, permutations, partitions), but not in all cases (e.g. fc-subsets of an n- 
set), it is possible to define a directed transition graph, namely the graph of pattern overlaps 
P„, and reduce the problem of constructing a U-cycle to constructing a Hamiltonian circuit 
for P„. Even though we do not need a Hamiltonian circuit (since we are concerned with 
paths of maximal length), but we can still try to use the same techniques as in jH ^] and 
in references therein. 
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However, it turns out that the abovementioned techniques work only for m = 2, which 
we are not interested in since we have an exphcit result in this case (see Theorem 13 .Tj) . The 
main problem is that the graph of pattern overlaps is not balanced, i.e. we have vertices 
where the indegree is not equal to the outdegree. Also, P„ is not the line graph of -P„_i. 
However, it is possible to prove the following statement. 

Theorem 3.8. The graph of pattern overlaps Pn contains a Hamiltonian circuit. 

Proof. We first observe that P„ is strongly connected. Indeed, suppose we are given two 
vertices of -P„, namely X = X1X2 ■ ■ - Xn and Y = yiy2 ■ ■ .yn- If denotes the vertex 11 ... 1, 
then we can find a path Px from X to /. Indeed, If Xi is the largest letter in X, then we 
consider the following path in P„: 

X = X1X2 ■ ■ ■ Xn — >■ X2X3 . . . XnXi — > ■ ■ ■ ^ XiXi^i . . . Xi^i — > Xi^i . . . Xj_il = X' . 

Thus, in X' we get 1 in place of the largest letter of X. We observe that X' is obviously a 
pattern. Clearly, we can continue this path by replacing the largest letters, one by one, with 
I's until we arrive at /. On the other hand, it is easy to see that the operation of changing a 
largest letter to 1 is invertible. For instance, in order to find a path from X' to X, we may 
do the following sequence of steps: 

X = Xi-^i . . . Xi^il — > Xi^2 ■ ■ ■ — > ■ ■ ■ — > IXj+i . . . XnXi . . . Xi-i — > 
• • • •^Ti'^l ■ ■ ■ ■^i—l'^i ^ ■^i-\-2 • • • ^ ' ' ' ^ X\X2 ■ ■ ■ X^i X. 

Thus, we can find a path from / to Y , which together with the path Px-, gives a path from X 
to Y . Similarly, one can get a path from Y to X, which proves that P„ is strongly connected. 

The main property we use when proving P has a Hamiltonian circuit is illustrated in 
Figure IHA.. It says that if Ci and C2 are two circuits corresponding to different conjugacy 
classes with respect to the shift, and there is an arc from Ci to C2 then there is an arc from 
C2 to Ci and vise versa. Moreover, in all cases but one (see discussion below), we can choose 
these arcs as in the Figure HIA., that is once we leave Ci at the vertex xW , we can come back, 
after visiting C2, at the vertex Wx, which is adjacent to xW on the circuit Ci. The notation 
xW (resp. Wx) is used to indicate a pattern of length n with the first (resp. last) letter x. 
The only exception when the picture differs from that on Figure |3]A. is the loop 11 ... 1, and 
there is only circuit adjacent to it, namely the one generated by 11 . . . 12. In this case xW 
coincides with Wx, which however does not affect our considerations below. 



The basic idea: We show the existence of a Hamiltonian circuit iteratively, starting from 
any circuit corresponding to a conjugacy class with respect to the shift, and on each following 
iteration creating a new circuit that contains the previous one and has more vertices since 
it covers additional circuits corresponding to some conjugacy classes (by covering here we 
mean containing all the vertices from a circuit in our big circuit). Moreover, we construct 
the big circuit so that once it arrives at a new circuit corresponding to a conjugacy class, it 
uses all the vertices from that circuit before leaving. We keep doing that using the fact that 
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A. 



C 




B. 



C 




Figure 4. Circuits in Pn- 



Pn is a disjoint union of the circuits corresponding to the conjugacy classes, until we create 
a Hamiltonian circuit. 

Let Hi be an arbitrary circuit corresponding to a conjugacy class with respect to the shift. 
Now assume we made i iterative steps and obtained a circuit Hi. If Hi covers all the vertices 
of Pn, then we are done. Otherwise, on iteration i + 1 we proceed as follows. 

The fact that Pn is strongly connected ensures that there is an arc from a circuit C covered 
by Hi to a circuit which is not covered by Hi. Our strategy is to start from the vertex where 
Hi arrived at C, then go around C following Hi vertex by vertex, until we reach the vertex 
in which Hi leaves C, and at each step, checking if it is possible to extend Hi according to 
the following considerations. 

Assume we are in the vertex xW in C. If there is only one arc coming out of C, namely 
the arc to the vertex Wx belonging to C, then we cannot extend Hi at this step, so we need 
to consider the next vertex Wx instead. Otherwise, there are j > 1 arcs that come out 
from xW to j different circuits corresponding to some conjugacy classes (we denote the set 
of these circuits by B). The case j = 1 is shown on Figure |31\, if we assume C = Ci. In 
this case there are two possibilities: either C2 is covered by Hi or not. In the first case we 
cannot extend Hi, so we need to consider the vertex Wx belonging to C to proceed further. 
In the second case, we can extend Hi by going to the vertex Wy, then through the vertices 
belonging to C2 until we reach yW , then we come back to C at the vertex Wx. 

When j > 1, either all circuits from B are already covered by Hi, or there is a number of 
circuits that are not covered hj B B (we denote the set of these circuits by Bq). In the first 
cannot extend Hi and we need to continue to proceed to the vertex Wx. We claim that in 
the second case there is a path starting from the vertex xW, going through all the vertices 
from the circuits from Bq and coming to the vertex Wx. We can extend Hi with this path. 
This claim is not hard to prove for any j, for instance by induction. However, we only give 
our proof in the case j = 3 (see Figure EJB) as it is easily generalizable. 

In Figure EJ3, Wy, Wz, and Wu are representatives from the circuits C3, C2 and Ci 
respectively, which belong to B. The key observation here is that any other circuit in B is 
as good as C, that is, e.g. we can go from xW to any of the vertices yW, zW and uW , but 
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we can also go from, say, uW to any of these vertices, li B = Bq, then we can start at xW, 
go to Wu, go to uW through Ci, then to Wz, then go to zW through C2, to Wy, to yW 
through C3 and finally come to Wx, in which case we succeeded to extend Hi. If B ^ Bq, we 
use the same procedure simply skipping the circuits not in Bq. E.g. if C2 ^ Bq, we change 
the path above by going from uW directly to Wy, again extending Hi. 

Thus, we constructed the circuit -ffj+i that contains more vertices than Hi does. Since Pn 
has finitely many vertices, P„ must contain a Hamiltonian circuit. □ 

Remark 3.9. The proof of theorem 13.81 can be simplified, if we add exactly one circuit 
corresponding to a conjugacy class at each iteration. Indeed, in this case we do not need 
to consider the sets B and Bq used in the proof, as well as the illustration on Figure 0J3. 
Thus, once we find a circuit to add to the big circuit, we can start a new iteration. However, 
we keep the more complicated proof since it helps understand the structure of the graph of 
pattern overlaps more deeply. 

Remark 3.10. One can test how the algorithm of finding a Hamiltonian circuit in P„ works 
in the case n = 3 and m = 3 on Figure 01 

As an immediate corollary to Theorem 13.81 we have the following: 

Corollary 3.11. For any m and n, there exists a U-cycle for word patterns of length n over 
an m-letter alphabet. 

The following proposition is easy to prove using elementary combinatorics. 
Proposition 3.12. The number of different word patterns of length n on m letters is 

n 



E E 

i = \ aj^H h-o.^ — n 

ai > 1, . . . ,a,- > 1 



ai 



Now, using the discussion in the beginning of the subsubsection. Theorem 13 . 81 and Propo- 
sition 13.121 we obtain the following proposition. 

Proposition 3.13. Cp{n,m) > ( j. 

As a final remark, we observe, that another way to get the number of different word 
patterns of length n on m letters is using a correction in the formula for Tp{m, n) like we did 
when we obtained the upper bound for Cp{n,m). But in this case the correction is i rather 
then ^ — 1, which says that we consider each conjugacy class with respect to shift and find 
the number of elements in it. Thus, i and 1/i cancel each other, and we get a combinatorial 
proof of the following identity: 

m /V min(j,m)-l / • /• \ i\ 

E E {j:J-i: E E.w(--(v-»)-#. 

i=l ai + -- + ai = n V ' 'V j|„ j=0 \ J d\i 

a]^ > 1, . . . ,aj > 1 
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